BMC Genomics
○ Springer Science and Business Media LLC
Preprints posted in the last 90 days, ranked by how well they match BMC Genomics's content profile, based on 328 papers previously published here. The average preprint has a 0.19% match score for this journal, so anything above that is already an above-average fit.
Meng, F.; Turner, D. L.; Hagenauer, M. H.; Watson, S.; Akil, H.
Show abstract
To detect currently unannotated genes with low expression levels with high sensitivity and accuracy, we developed a new exon->gene->transcript annotation pipeline that can identify previously undetected multi-exon transcripts using large volumes of RNA-Seq data. Our pipeline incorporates three new algorithms: 1) model-based spliced exon detection, 2) exon-to-gene assignment across multiple tissue/datasets through exon community discovery, and 3) ranking top transcripts by a stepwise minimum flow procedure. The design of our pipeline allowed us to leverage hundreds of Tbases of public RNA-seq data as input to improve mouse and rat genome annotation. Using this data, our pipeline identified close to 15K and 21K unannotated genes in GENCODE M37 and ENSEMBL 114 for mouse and rat, respectively. Each species also gained over 200K predicted transcripts containing at least one new exon, although most were transcripts from GENCODE/ENSEMBL annotated genes with newly assigned exons. To make our genome annotation available for common use, we have packaged this new annotation in standard file formats for the analysis of bulk and single cell RNA-seq data (GTF, 10X genome files). We have also provided two use examples which demonstrate the utility of our newly annotated genes in functional analyses, showing that their expression can be differentially regulated in relationship to cell type and selective breeding. Due to the efficiency provided by our pipeline, we expect that as new RNA-seq data become available in the coming years it will significantly benefit rat gene/transcript annotation, eventually enabling us to approach the target of complete gene and transcript annotation.
Alvarez Jerez, P.; Rhie, A.; Kim, J.; Hebbar, P.; Nag, S.; Antipov, D.; Koren, S.; Lara, E.; Beilina, A.; Hansen, N. F.; Arber, C. F.; Zulueta, J.; Wild-Crea, P.; Patel, D.; Hickey, G.; Waltz, B.; Malik, L.; Skarnes, W. C.; Reed, X.; Genner, R.; Daida, K.; Pantazis, C. B.; Grenn, F.; Nalls, M. A.; Billingsley, K.; Fossati, V.; Wray, S.; Ward, M.; Ryten, M.; Cookson, M. R.; Jain, M.; Paten, B.; Phillippy, A. M.; Blauwendraat, C.
Show abstract
While induced pluripotent stem cells (iPSCs) have gained popularity in studying neurodegenerative diseases, the heterogeneity of stem cells used across studies impacts cross-study comparison. The iPSC Neurodegenerative Disease Initiative (iNDI) selected the KOLF2.1J cell line and prioritized its use as a reference standard for studying the effects of pathogenic variants on cell biology due to its stability and neutral neurodegenerative disease genetic risk. This cell line, and its derivatives expressing over 100 variants related to Alzheimers disease, Parkinsons disease, and other neurological diseases, are available for academic and industry access. Current genomic data analyses are limited by the use of a human reference genome that does not capture the complete genetic background of a given iPSC line. While in the future this issue may be partially mitigated by the creation of a comprehensive human pangenome, previous work has shown that generating custom genomes is of value both to characterize the variation present and to serve as a more appropriate genomic reference. Here, we generated and characterized a custom complete genome assembly from KOLF2.1J. Mapping of sequencing reads to a personalized diploid assembly results in more comprehensive mapping compared to traditional linear references (i.e GRCh38). In addition, we provide a comprehensive custom gene annotation along with isoform expression and differential methylation analyses across multiple cell types. The assembly and all additional data is browsable and publicly available. This resource will enable more accurate investigation of the KOLF2.1J cell line and any genomics data generated compared to using traditional generalized references, while also serving as a foundational approach for establishing custom reference assemblies for other high-value iPSC lines.
Biase, F. H.; Morozyuk, M.; Ezepha, C.
Show abstract
BackgroundSingle-cell RNA sequencing (scRNA-seq) integration methods remove technical variation while preserving biological signal, yet systematic frameworks for evaluating how parameter choices influence biological interpretation remain limited. Traditional benchmarking approaches evaluate single-parameter configurations per method, potentially missing systematic patterns in functional outcomes and method convergence. A framework for systematic integration parameter evaluation was developed and applied to bovine embryo development. ResultsSix integration methods (FastMNN, CCA, RPCA, scVI, Harmony, STACAS) combined with multiple parameters, including those for neighbor identification and clustering, yielded 8232 combinations. The main outputs evaluated were specific cell counts and marker identification. After filtering for extremely poor cell and marker identification, 4,287 integration parameter combinations were retained for analysis. There were three major patterns (clusters) with integration methods distributed non-randomly across clusters and distinct biological outcomes. One pattern emerged, composed of scVI and STACAS integration, dominated by the lack of identification of epiblast cells. Cluster 2 (n=29), also composed of scVI and STACAS integration, identified the most epiblast markers (n=7, 8, or 9) but had a limited number of epiblast cells (median=10). Cluster 1 (n=4,120 combinations) had the highest method diversity. Across clusters, trophoblast and mesoderm showed high functional distinctness, while epiblast and hypoblast showed moderate overlap in gene ontology classes. ConclusionsThe approach reveals that parameter choices influence cell type classification, functional interpretation, and the degree of method convergence, with implications for identifying specific biological inferences for further orthogonal validation. A systematic approach to evaluating integration methods, along with other parameters, is advisable for accurate biological inference.
Leonard, A. S.; Pausch, H.
Show abstract
BackgroundRecombination of parental haplotypes is a fundamental biological process that ensures proper segregation of homologous chromosomes and creates new combinations of alleles during meiosis. Crossover events are typically detected from large-scale pedigree-based genetic studies or linkage disequilibrium-based recombination maps, although these are generally limited to SNPs. Increasing amounts of long read sequencing and haplotype-resolved assemblies offer an alternative approach to examining recombination events at basepair resolution, albeit with much smaller sample sizes. ResultsHere, we analyse five high-quality genome assemblies from the Simmental cattle breed, including a newly assembled triobinned HiFi assembly of an Eringer x Simmental cross (N50 of 77 Mb and a k-mer quality value of 55.3). We integrate the five assemblies, of which two originate from maternal half-siblings, into a reference-free Simmental-specific pangenome. By considering path similarities in the pangenome, we were able to identify putative crossover events in the haplotypes of the half-siblings, as well as a greater number of events relative to the cousin due to an additional degree of generational separation. We validated the pangenome approach with phased SNPs called from linear alignments of maternal short read sequencing, with 23 of 30 chromosomes having the same recombination predictions. We identified 5 and 16.7 Mb of non-reference insertion sequences respectively shared or private to the half-siblings, enabling testing for recombination events beyond only SNP markers. We also identified four differentially methylated CpG clusters from the 5mC signal of HiFi reads which allowed us to narrow the window containing the putative recombination event from 35 to 20 Mb within the longest run of homozygosity. ConclusionStructural variants and methylation information identified from long read sequencing and genome assemblies may help identify recombination events in regions beyond those typically called from SNPs. Furthermore, while existing long read-based methylation calls can be noisy and report unrealistic intermediate methylation levels, 5mC methylation appears to be a promising avenue for distinguishing haplotypes in the absence of genomic variation.
Haugan, I.; Flatby, H. M.; Lysvand, H.; Skei, N. V.; Zaragkoulias, K.; Solligard, E.; Ronning, T. G.; Olsen, L. C.; Damas, J. K.; Afset, J. E.; As, C. G.
Show abstract
Whole-genome sequencing (WGS) is increasingly being utilised in microbial diagnostics, surveillance, and research. In this paper we assess the performance of one leading long-read sequencing technology, Oxford Nanopore Technology (ONT), on 836 Staphylococcus aureus bacteraemia isolates. We compare the results to that of a leading short-read sequencing technology, Illumina. All isolates were sequenced using ONT MinION Mk1B and Illumina HiSeq or MiSeq. Libraries were prepared according to manufacturers instructions. Preprocessing and downstream bioinformatic analyses were performed using a combination of in-house pipelines and publicly available software tools. The average base substitution error rate in ONT assemblies was low but varied between sequence types, possibly due to lineage-specific methylation patterns. Multi locus sequence typing was similar between the technologies, while ONT assemblies allowed for better spa typing than Illumina assemblies. The reported detection rate was similar between ONT and Illumina assemblies for most virulence- and AMR-associated genes and variants. For 42 (22.2%) of 189 genes/variants, the two technologies disagreed in gene detection in 5 isolates or more, and in 39 (20.6.%) of these the highest detection rate was found with ONT. Discrepancies were mainly associated with low GC content, multiple repetitive segments, and small plasmids. Polishing of ONT data resulted in minor changes in gene/variant calling. Our study supports the use of ONT WGS for bacterial population genomic studies on a large collection of S. aureus isolates. While assembly of ONT reads may be affected by its own methodological limitations, it was superior to Illumina assemblies in detection of potentially clinically relevant genes and variants at a low read error rate. Understanding the advantages and limitations of WGS technologies is essential before undertaking studies involving such methods on large sets of bacteria. Author summaryIn this paper, we present a practical assessment of one important whole genome sequencing (WGS) method, Oxford Nanopore Technology (ONT), and compare its performance in bacterial population genomics to that of WGS with Illumina technology. Our goal was to investigate the usefulness of ONT in studies aiming to identify clinically relevant bacterial characteristics in large collections of bacteria, such as genotype-phenotype studies. We sequenced a large set of clinical S. aureus isolates from episodes of bloodstream infections using both ONT and Illumina technologies and performed analyses with widely used software and bioinformatic pipelines. We have elucidated inherent strengths and limitations of ONT and Illumina sequencing and report some of the practical consequences of these on bacterial typing and detection of clinically relevant genes. With this study, we present one of the most comprehensive assessments of long-read sequencing technology for the genomic characterisation of clinical bacterial isolates, and the findings provide guidance for researchers considering WGS in large-scale bacterial genomics.
Ahmad, A.; mustafa, h.; Khan, W. A.; Manan, A.; Anwer, I.; Akram, W.
Show abstract
Linkage disequilibrium (LD) and haplotype block structure govern the resolution and utility of genomic selection, marker-assisted selection, and genome-wide association studies (GWAS) in livestock. We performed a comprehensive genome-wide characterization of LD decay, haplotype block architecture, and population diversity across all 24 autosomes in Nili-Ravi buffalo (Bubalus bubalis; n = 85), using 43,543 post-quality-control SNPs. Mean genome-wide r2 was 0.124 (median 0.074) and mean D was 0.540 (median 0.481), with LD half-decay at {approx}70 kb. A total of 133 haplotype blocks encompassing 721 SNPs were identified (Gabriel et al., 2002). Haploview analysis of nine chromosomes harbouring bTB resistance candidate genes revealed contrasting selection signatures: directional selection at innate immune loci (IFNG, TLR1; H < 0.55) versus balancing selection at adaptive immune loci (BoLA-DRB3, SP110; H > 1.0). Critically, BBU15 Block 3 (28.6 kb; OR52E5/NCR1 locus, 47.16 Mb) showed a genome-wide significant integrated haplotype score (iHS; -log1 0 p = 5.408), directly co-localising with the published bTB susceptibility QTL (Bermingham et al., 2014). The TAA haplotype (frequency 53.3%) at this block represents a candidate resistance-associated haplotype for marker-assisted selection. These findings provide essential parameters for SNP panel design and bTB resistance breeding in South Asian buffalo.
Haukenfrers, E. J.; Jain, V.; Arvai, S. F.; Patel, K. K.; Gregory, S. G.; Abramson, K. R.; Swain Lenz, D.
Show abstract
The rapidly advancing field of single cell RNA sequencing (scRNAseq) offers numerous options for transcriptome profiling. However, questions remain as to which chemistry is appropriate for individual experimental goals. Preceding single cell benchmarking studies included previously available methods and involved a mixture of fresh and fixed samples or probe- and non-probe-based capture methods. However, the inherent differences in sample types and methods limited the conclusions to be drawn between analogous technologies. Here, we present a novel, systematic comparison of four widely used non-probe-based, non-formalin fixed scRNAseq assays. We build upon past comparisons that used varied computational pipelines by applying both platform-specific and agnostic cell calling algorithms for an unbiased comparison of biological and technical replicates from healthy human PBMCs. Our approach evaluates 10x Genomics, Parse Biosciences (QIAGEN), Scale Biosciences (10x Genomics), and Illumina scRNAseq assays to examine data based on accuracy, sensitivity, precision, power, and efficiency using agnostic and platform-specific cell calling. While metrics vary between assays, there are clear advantages and limitations to each technology, including experimental time and financial costs. In summary, our study highlights the need for carefully considered project design of non-formalin fixed scRNAseq assays, which is determined by many factors and dependent on an investigators specific research aims and available resources. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=198 SRC="FIGDIR/small/702057v1_ufig1.gif" ALT="Figure 1"> View larger version (46K): org.highwire.dtl.DTLVardef@19d9a17org.highwire.dtl.DTLVardef@1ef650aorg.highwire.dtl.DTLVardef@1d27484org.highwire.dtl.DTLVardef@1df8c2c_HPS_FORMAT_FIGEXP M_FIG C_FIG
Weir, J. A.; Krebs, Y.; Chen, F.
Show abstract
Probe-based single cell RNA sequencing approaches are increasingly becoming a technology of choice for profiling gene expression at scale and in archival tissues. The 10x Genomics Flex v1 assay enables cost-effective and high-sensitivity single-cell RNA sequencing by splitting samples across up to 16 uniquely barcoded probe sets before pooling and loading onto a single lane of a microfluidic chip. A natural consequence of this design is to leverage probe set barcoding as a sample barcoding strategy for case-control experiments. However, we observed that Flex v1 probe set barcode identity drives substantial technical variation between probe set barcodes, an effect that is reproducible across lanes and independent datasets. When Flex v1 probe set barcodes are confounded with biological sample identity, a concerning number of differentially expressed genes at standard thresholds are false positives. The Flex v2 assay, which decouples sample barcoding from probe set hybridization, significantly reduces this artifact. As the field continues to expand adoption of probe-based assays, our findings introduce probe set barcoding as an underappreciated source of technical variation in single-cell assays and emphasize the importance of experimental design when using probe-based sequencing technologies.
Roedelsperger, C.; Agyal, N.; Quiobe, S. P.; Wu, H.; Ibarra-Morales, D.; Sommer, R. J.
Show abstract
Continuous developments in sequencing technologies have led to the generation of chromosome-scale genome assemblies across the whole tree of life, but our ability to annotate genomes has lacked behind. One major problem consists in the fact that typically not all genes are expressed at detectable levels at any given life stage or environment. Therefore, available transcriptome data needs to be complemented by gene prediction programs and protein homology evidence. However, how to optimally combine these different data types is not well understood. Here, we present a case study, where we community curated gene annotations of the Pristionchus pacificus strain RSC011. By incorporation of new Iso-seq and RNA-seq data and genome-wide screening, we identified and corrected more than 7,500 ([~]24%) gene models. While the improved gene annotation for the RSC011 strain will be useful for the P. pacificus community, our study reveals several gene annotation problems that may affect data from other species. Among these, we identified assembly errors, artificial transcript fusions resulting from overlapping genes and polycistronic RNAs, falsely called open reading frames, and error propagation based on homology data as frequent sources of gene annotation errors. Thus, our findings may be helpful in guiding future efforts to annotate genomes across different taxonomic groups.
Toga, K.; Yokoi, K.; Bono, H.
Show abstract
Eusociality in bees represents a major evolutionary transition and understanding its molecular basis is fundamental for sociogenomic studies. Comparative genomics has revealed correlations between transcription factor binding site (TFBS) abundance and social complexity; however, when and where these TFBSs function in a eusocial context remains largely unclear. In this study, we performed cap analysis of gene expression (CAGE) during worker metamorphosis in the honeybee Apis mellifera to identify TFBSs within active enhancers and decipher the regulatory relationships between these enhancers and their target genes. We identified 17,349 transcription start sites (TSSs) and 842 candidate enhancers. Using CAGE, we identified five clusters based on expression patterns. Notably, genes associated with the canonical metamorphic regulators, Broad complex (Br-c) and E93, were found within specific clusters. By integrating the correlations between enhancer and TSS activities with motif enrichment analysis, we identified 15 transcription factor-enhancer-TSS regulatory relationships. Among these, tramtrack (ttk)-binding sites were identified in five enhancers associated with four target genes, including Br-c. The number of target genes regulated by ttk was the highest in our dataset. To examine whether this regulatory relationship is conserved across bee species with varying levels of sociality, we analyzed the sequence conservation of ttk-binding sites in Br-c enhancers and found that perfect sequence conservation of ttk-binding site was restricted to the Apis genus. The ttk-binding sites of other target genes exhibited the same Apis-specific conservation pattern. Our findings suggest that gene regulatory relationships during worker metamorphosis occur in a lineage-specific manner in the Apis genus. SignificanceHoneybees produce distinct castes--queens and workers--from genetically identical larvae via differences in gene regulation. Although enhancers have been computationally predicted, their actual activity during bee development has rarely been measured directly, and the CAGE technology has never been applied for this purpose. We identified active enhancers during worker metamorphosis and discovered that the transcription factor ttk may regulate Br-c, a key developmental gene. This study provides the first direct evidence of active enhancers and their regulatory roles in honeybee worker metamorphosis.
Li, F.; Lima, D.; Bashir, S.; Yadro Garcia, C.; Lopes, A. R.; Verbinnen, G.; de Graaf, D. C.; De Smet, L.; Rodriguez, A.; Rosa-Fontana, A.; Rufino, J.; Martin-Hernandez, R.; Medibees Consortium, ; Pinto, M. A.; Henriques, D.
Show abstract
The western honey bee (Apis mellifera) is an essential pollinator facing unprecedented threats from pesticide exposure. While pesticide resistance evolution is well documented in agricultural pests, our understanding of genetic variation in honey bee detoxification systems remains limited. This represents a missed opportunity, as harnessing naturally occurring detoxification diversity could provide new avenues for pollinator protection. Cytochrome P450 monooxygenases (CYPs), which are central to xenobiotic metabolism, offer a promising starting point. Here, we present the first comprehensive analysis of CYP genetic diversity in A. mellifera. We analysed the CYPome of 1,467 individuals representing 18 A. mellifera subspecies from 25 countries and identified 5,756 single-nucleotide polymorphisms (SNPs) in 46 CYP genes. Imputed McDonald-Kreitman testing revealed that 56% of non-synonymous CYP substitutions were driven by positive selection. Of the 1,302 haplotypes identified, 84% resided in CYP3, concentrated in the CYP9 and CYP6AS subfamilies implicated in xenobiotic detoxification. Population-level analysis of nucleotide diversity, Tajimas D selection signatures, FST-based differentiation, and McDonald-Kreitman testing pointed to CYP3 clan genes as the primary locus of adaptive variation. This work provides the first step toward building a comprehensive pharmacogenomic resource for honey bees, enabling the prediction of population-specific pesticide vulnerabilities and leveraging naturally occurring detoxification variants to enhance pollinator resilience - a critical step toward sustainable pollinator management.
Shi, J.; Lu, Z.; Sui, M.; Mu, M.; Zhang, D.; Bao, Z.; Hu, J.; Zeng, Q.; Ye, Z.
Show abstract
BackgroundGenomic selection (GS) has revolutionized animal breeding, spanning livestock sectors such as pigs and cattle to aquatic species like fish and shrimp. However, its broader application across these industries is often constrained by high genotyping costs and reduced predictive reliability across divergent populations or generations. Developing cost-effective, biologically informed genotyping strategies to overcome these limitations remains a critical goal in animal agriculture. Epigenetic annotations, particularly histone modifications, provide direct functional insights into regulatory elements underlying complex trait variation and represent a promising but underexplored resource for marker prioritization. ResultsHere, using the Pacific white shrimp (Litopenaeus vannamei) as a model organism, we conducted a proof-of-concept study integrating resequencing and phenotypic data from 972 individuals. We generated high-resolution epigenomic maps by profiling four histone marks (H3K4me1, H3K4me3, H3K27me3, and H3K27ac) across multiple embryonic stages and adult muscle tissue using CUT&Tag. These functional annotations were then leveraged to prioritize single nucleotide polymorphism (SNP) subsets for genomic prediction. Among the tested strategies, SNPs located in the muscle-specific bivalent promoter/enhancer (E6) state--characterized by the co-occurrence of active and repressive marks--consistently maximized prediction accuracy under the BayesA model. Notably, even at a moderate density (15k), E6-derived SNPs achieved prediction accuracies exceeding those obtained using substantially larger genome-wide SNP sets. Most importantly, in a challenging cross-population validation using an independent strain, the E6-derived SNP subset significantly improved prediction accuracy by 47.6% (increasing from 0.21 {+/-} 0.05 to 0.31 {+/-} 0.04, p < 0.05) compared to random subsets at equivalent density. ConclusionsThese results demonstrate that epigenetic annotation-guided SNP prioritization provides a biologically informed and cost-effective strategy to enhance genomic prediction accuracy and stability. This framework is broadly transferable across species and offers a practical strategy for designing low-density genotyping panels that reduce costs while maintaining reliable selection outcomes in large-scale breeding programs.
Kaplan, L.; Edgerton, S. J.; Mahoney, B. D.; Ray, C. A.; Reh, T. A.
Show abstract
Transgenic mouse lines are essential to uncover organ or system level genotype-phenotype relationships. The generation of such lines via transgene addition may lead to the insertion into unknown genomic loci potentially leading not only to the disruption of native genes but also attenuation of transgene expression. Additionally, this often results in the inability to determine transgene zygosity which in turn complicates breeding and interpretation of experimental results. In this study we present two whole genome sequencing based pipelines that allow the identification and genotyping of even complex multi transgenic inserts. As they use widely available reagents and bioinformatic tools, they can easily be applied to develop genotyping strategies in potentially any species.
Caliendo, C.; Gerber, S.; Pfenninger, M.
Show abstract
Detecting signals of polygenic adaptation remains a significant challenge in population genomics, as traditional methods often struggle to identify the associated subtle, multi-locus allele-frequency shifts. Here, we introduced and tested several novel approaches combining machine learning techniques with traditional statistical tests to detect polygenic adaptation patterns in time-series of allele frequency changes from whole genome data. We implemented a Naive Bayesian Classifier (NBC) and One-Class Support Vector Machines (OCSVM), and compared their performance against the classical Fishers Exact Test (FET). Furthermore, we combined machine learning and statistical models (OCSVM-FET and NBC-FET), resulting in 5 competing approaches. Using a simulated data set based on empirical evolve-and-resequencing Chironomus riparius genomic data, we evaluated methods across evolutionary scenarios, varying in generations, selection strength and numbers of loci under selection. Our results demonstrate that the combined OCSVM-FET approach consistently outperformed competing methods, achieving the lowest false positive rate, highest area under the curve, and high accuracy. The performance peak aligned with what we term the late dynamic phase of adaptation--the period after initial selection has occurred but before fixation--highlighting the methods sensitivity to ongoing selective processes and thus its value for experimental approaches. Furthermore, we emphasize the critical role of parameter tuning, balancing biological assumptions with methodological rigor. Our approach offers a powerful tool for detecting polygenic adaptation from time series, e.g. pool sequencing data from evolve-and-resequence experiments.
Rodriguez-Vazquez, R.; Mukiibi, R.; Ferraresso, S.; Franch, R.; Peruzza, L.; Rovere, G. D.; Radojicic, J.; Babbucci, M.; Bertotto, D.; Toffan, A.; Pascoli, F.; Penaloza, C.; Houston, R. D.; Tsigenopoulos, C. S.; Bargelloni, L.; Robledo, D.
Show abstract
MicroRNAs (miRNAs) are key post-transcriptional regulators of antiviral immunity, controlling gene expression by targeting 3 UTRs of immune-related transcripts. Despite their importance, the role of miRNAs in viral nervous necrosis (VNN) resistance in European seabass (Dicentrarchus labrax) is unexplored. Here, we characterized for the first time the brain miRNome of seabass from three VNN-resistance genotypes (susceptible, intermediate, resistant) across two genetically distinct seabass clusters. Differential expression analyses revealed cluster-specific patterns, with susceptible fish consistently showing overexpression of the differently expressed miRNAs (DEmiRNAs) as compared to the resistant fish. Considering the two genetic clusters in the study, miR-199-5p was differentially expressed between the VNN susceptible and resistant fish. This miRNA was found to be less expressed in the resistant individuals. Functional characterization of the miRNA predicted that it binds to two distinct miRNA recognition elements (MREs) within the ifi27l2a 3 UTR. These MREs flank a SNP (Chr3:10,082,380) previously associated with VNN survival. A strong negative correlation (r= -0.840) between miR-199-5p expression and ifi27l2a mRNA abundance further supports a post-transcriptional repression mechanism. Together, these results propose a regulatory model in which miR-199-5p modulates ifi27l2a expression, contributing to phenotypic variation in VNN resistance and positioning it as a promising biomarker for seabass aquaculture breeding.
Rodriguez-Vazquez, R.; Karami, A. M.; Robledo, D.; Buchmann, K.
Show abstract
Rainbow trout is affected by a broad range of pathogens causing large economic losses and animal welfare concerns. Marker-assisted selection can significantly enhance resistance to pathogens in a few generations, and to this end many studies have focused on identifying quantitative trait loci (QTLs) for resistance traits. The integration of accumulated genetic resources provides an opportunity to uncover important genetic variation and candidate genes crucially involved in rainbow trout immunity. Here, we present a comprehensive meta-QTL (MQTL) analysis based on the integration of 145 QTLs related to pathogen resistance. These QTLs were refined into 26 MQTLs, of which 15 were validated by genome-wide association studies (GWAS). The average confidence interval (CI) of these MQTLs was reduced by 2.03-fold compared to the initial QTL, improving mapping precision. Integration of GWAS results revealed regions along the rainbow trout genome pivotal for pathogen resistance, and a major region in chromosome 3, which could be used in marker-assisted selection. Further, among the validated MQTLs we identified a subset of high-confidence MQTLs, based on those supported by at least three initial QTL from more than two independent studies, with a percentage of variance explained greater than 8% and a LOD score higher than three. Gene annotation identified 11 unique candidate genes within these high-confidence MQTLs involved in immune pathways, encoding proteins involved in the regulation of immune responses, signalling pathways, receptor activity, and direct immune effector production. The MQTLs and candidate genes identified are valuable resources for advancing molecular breeding and unravelling the genetic basis of pathogen resistance in rainbow trout.
Brate, J.; Grande, E. G.; Pedersen, B. N.; Frengen, T. G.; Stene-Johansen, K.
Show abstract
Here we evaluated the performance of a previously published tiling PCR primer scheme by Ringlander et al. (2022) for whole-genome amplification of Hepatitis B virus (HBV) in combination with Oxford Nanopore sequencing. The primer set originally developed for Ion Torrent sequencing was adapted by removing platform-specific adapters and tested using clinical serum or plasma samples submitted for routine HBV genotyping and resistance testing. Two multiplexing strategies were compared: a single PCR pool containing all primers and a two-pool strategy with non-overlapping amplicons. Sequencing reads were processed using a Nanopore analysis pipeline, and genome coverage and amplicon performance were compared across samples spanning a wide Ct range and representing HBV genotypes A-E. Across all samples, the median genome coverage was approximately 50%, although recovery varied widely, ranging from complete failure to nearly full genomes. Combining all primers into a single PCR reaction, or separating overlapping amplicons into different reactions, had little overall impact on genome recovery, and no consistent differences between the two pooling strategies were observed. In contrast, amplification efficiency differed markedly between individual amplicons. Amplicons 1-5 generally produced higher sequencing depth, whereas amplicons 6-10 frequently showed low coverage and contributed to incomplete genome recovery. Genome coverage was strongly associated with Ct values, with higher coverage observed in samples with lower Ct values, while coverage was broadly similar across genotypes. These results demonstrate that the Ringlander et al. primer scheme can be adapted for multiplex PCR and Nanopore sequencing of HBV, but uneven amplicon performance limits consistent full-genome recovery and highlights the need for further optimization of HBV tiling PCR designs.
Bachler, A.; Walsh, T. K.; Andrews, D.; Williams, M.; Tay, W. T.; Gordon, K. H.; James, B.; Fang, C.; Wang, L.; Wu, Y.; Stone, E. A.; Padovan, A.
Show abstract
BackgroundThe cotton bollworm Helicoverpa armigera is a major global pest controlled by genetically engineered crops expressing Bacillus thuringiensis (Bt) toxins, including Vip3Aa. While Vip3Aa is widely deployed, the genetic basis of resistance remains poorly understood. Previous work identified disruption of a thyroglobulin-like gene (HaVipR1) as one mechanism of resistance, suggesting additional loci may be involved. ResultsUsing linkage analysis, transcriptomics, long-read sequencing, and CRISPR-Cas9 gene editing, we identify a second thyroglobulin-like gene, HaVipR2, as a novel mediator of Vip3Aa resistance. Resistance in a field-derived H. armigera line was shown to be monogenic, recessive, and autosomal, mapping to chromosome 29. Long-read sequencing revealed a [~]16 kb transposable element insertion disrupting HaVipR2, which was undetectable using standard short-read approaches. CRISPR-Cas9 knockout of HaVipR2 conferred >900-fold resistance, confirming its causal role. Comparative analyses show that HaVipR1 and HaVipR2 share conserved domain architecture, indicating that thyroglobulin-domain proteins represent a recurrent target of resistance evolution. ConclusionsOur findings establish thyroglobulin-domain proteins as a new class of Bt resistance genes in Lepidoptera and demonstrate that transposable element insertions can drive adaptive resistance while evading detection by conventional methods. These results highlight the importance of long-read sequencing and accurate genome annotation for resistance monitoring and provide new insights into the molecular basis and evolution of Vip3Aa resistance.
Grebler, E. E. C.; Mongue, A. J.
Show abstract
Recent advances in sequencing technology have made the sequencing of non-model organisms significantly more streamlined and feasible. Using these technologies, we begin to address the lack of data on non-model organisms, by sequencing the genome of one such species, Phymata mystica (Evans 1931), an ambush bug (Hemiptera: Heteroptera: Reduviidae: Phymatinae) specialized for floral sit-and-wait style predation. Our genome assembly is 710 Mb, in which 99.7% of this sequence is assembled into 14 chromosomal scaffolds. We found that repetitive elements accounted for 58.85% of the sequence. We report 26,760 protein-coding genes in a preliminary annotation of the genome. Using these new resources, we explored both macrosynteny and gene conservation. Starting with chromosome structure, we found that P. mystica has a single X chromosome, unlike other well-assembled Reduviids in which the X apparently split into two linkage groups. Exploring this new annotation, we found a number of venom proteins conserved between P. mystica and the other venomous Heteroptera with reference genomes, primarily serine proteases, metallopeptidase and heteropteran venom family proteins. These results provide a new framework for the evolution of venom in this group of insects and further demonstrate the ease with which non-model species can be studied using modern genomic methods.
Rodriguez Vazquez, R.; Gundappa, M. K.; Aramburu, O.; Radojicic, J.; Tsigenopoulos, C. S.; Ferraresso, S.; Franch, R.; Bargelloni, L.; Martinez, P.; Robledo, D.; Megens, H.-J.
Show abstract
Diseases triggered by bacterial and viral infections have caused huge economic losses for three of the most important European aquaculture species: turbot (Scophthalmus maximus), gilthead seabream (Sparus aurata) and European seabass (Dicentrarchus labrax). Understanding how they respond to pathogens is relevant for advancing aquaculture disease management and comprehending evolution of immune response within teleosts. Since mechanisms conserved across species are assumed to perform important roles, comparative analysis provides a powerful approach to pinpoint key elements of the immune defence. Here, we report the first comparative immune-transcriptomic analysis of these three species using bacterial and viral mimics after 20-24 hours post-stimulation with inactivated Vibrio anguillarum and Poly I:C in the head kidney of live fish (in vivo), and in primary leukocyte cultures (in vitro). The transcriptomic response, based on RNA-seq data, revealed a total of 503 differentially expressed orthologous genes in response to in vitro-Poly I:C, 1,472 to in vitro-Vibrio, 920 to in vivo-Poly I:C, and 832 to in vivo-Vibrio. Interestingly, consistent expression patterns were identified in seven genes across all species in both cell culture and live organisms in response to both pathogen stimuli. Functional enrichment analysis revealed associations with immunity, DNA replication and repair, and cytokine pathways, with the Toll-Like Receptor (TLR) pathway common to both conditions and stimuli. Our study suggests conservation of orthologous gene expression during infection across the three species for genes involved in chemokine pathways, interferon signalling, antigen processing and presentation, cell signalling regulators, and MAPK cascades. This study provides insights into key immune defence mechanisms in acanthopterygian bony fish.